Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries

نویسندگان

  • Simon Jonassen
  • Svein Erik Bratsberg
چکیده

In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranked Document Retrieval in (Almost) No Space

Ranked document retrieval is a fundamental task in search engines. Such queries are solved with inverted indexes that require additional 45%-80% of the compressed text space, and take tens to hundreds of microseconds per query. In this paper we show how ranked document retrieval queries can be solved within tens of milliseconds using essentially no extra space over an in-memory compressed repre...

متن کامل

Using Bitmap Indexing Technology for Combined Numerical and Text Queries

In this paper, we describe a strategy of using compressed bitmap indices to speed up queries on both numerical data and text documents. By using an efficient compression algorithm, these compressed bitmap indices are compact even for indices with millions of distinct terms. Moreover, bitmap indices can be used very efficiently to answer Boolean queries over text documents involving multiple que...

متن کامل

WORLDCOMP'12 Typing Instructions for Preparation of Final Camera-ready Papers

In a text database, a set of documents is maintained. To enquiry such a database, two kinds of queries are quite often used. One is the so-called conjunctive query, represented by a set of terms connected by conjunction (); and the other is the disjunctive query, which is also a set of terms, but connected by disjunction (). In this paper, we discuss an efficient and effective index mechanism...

متن کامل

Simon Jonassen Efficient Query Processing in

Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, wh...

متن کامل

Compressed Inverted Indexes for In-Memory Search Engines

We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures. It outperforms suffix array based techniques for all the above operations for real wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011